Overview

Dataset statistics

Number of variables16
Number of observations287544
Missing cells609
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory17.7 MiB
Average record size in memory64.4 B

Variable types

Numeric9
Categorical7

Warnings

SC has a high cardinality: 2649 distinct values High cardinality
Country has a high cardinality: 177 distinct values High cardinality
CountryCode has a high cardinality: 177 distinct values High cardinality
ArtsHumanities is highly skewed (γ1 = 42.92297734) Skewed
TCperYear is highly skewed (γ1 = 89.73195513) Skewed
NumAuthors is highly skewed (γ1 = 21.17953762) Skewed
ArtsHumanities has 287305 (99.9%) zeros Zeros
LifeSciencesBiomedicine has 242657 (84.4%) zeros Zeros
PhysicalSciences has 223642 (77.8%) zeros Zeros
SocialSciences has 279391 (97.2%) zeros Zeros
Technology has 54360 (18.9%) zeros Zeros
TCperYear has 80888 (28.1%) zeros Zeros

Reproduction

Analysis started2021-01-14 14:46:17.824718
Analysis finished2021-01-14 14:46:50.315475
Duration32.49 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

PY
Real number (ℝ≥0)

Distinct29
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2010.665651
Minimum1990
Maximum2018
Zeros0
Zeros (%)0.0%
Memory size561.7 KiB
2021-01-14T15:46:50.408636image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1990
5-th percentile1999
Q12006
median2012
Q32016
95-th percentile2018
Maximum2018
Range28
Interquartile range (IQR)10

Descriptive statistics

Standard deviation6.245364613
Coefficient of variation (CV)0.003106117922
Kurtosis-0.3513597005
Mean2010.665651
Median Absolute Deviation (MAD)5
Skewness-0.7236493956
Sum578154844
Variance39.00457915
MonotocityNot monotonic
2021-01-14T15:46:50.527868image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
201834202
 
11.9%
201728870
 
10.0%
201622408
 
7.8%
201520173
 
7.0%
201417061
 
5.9%
201315069
 
5.2%
201214000
 
4.9%
200914000
 
4.9%
200812879
 
4.5%
201112229
 
4.3%
Other values (19)96653
33.6%
ValueCountFrequency (%)
199072
 
< 0.1%
1991391
 
0.1%
1992674
0.2%
1993834
0.3%
1994994
0.3%
ValueCountFrequency (%)
201834202
11.9%
201728870
10.0%
201622408
7.8%
201520173
7.0%
201417061
5.9%

SC
Categorical

HIGH CARDINALITY

Distinct2649
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size662.4 KiB
Computer Science
44679 
Computer Science; Engineering
26207 
Engineering
21091 
Automation & Control Systems; Engineering
 
5863
Physics
 
5589
Other values (2644)
184115 

Length

Max length188
Median length29
Mean length33.72064449
Min length3

Characters and Unicode

Total characters9696169
Distinct characters49
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique592 ?
Unique (%)0.2%

Sample

1st rowComputer Science; Engineering
2nd rowComputer Science; Engineering
3rd rowComputer Science; Engineering
4th rowComputer Science; Engineering
5th rowComputer Science; Engineering
ValueCountFrequency (%)
Computer Science44679
 
15.5%
Computer Science; Engineering26207
 
9.1%
Engineering21091
 
7.3%
Automation & Control Systems; Engineering5863
 
2.0%
Physics5589
 
1.9%
Computer Science; Neurosciences & Neurology4587
 
1.6%
Science & Technology - Other Topics4478
 
1.6%
Computer Science; Engineering; Telecommunications4101
 
1.4%
Automation & Control Systems; Computer Science4022
 
1.4%
Mathematics3900
 
1.4%
Other values (2639)163027
56.7%
2021-01-14T15:46:50.838072image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
science170102
15.3%
150429
 
13.5%
computer123626
 
11.1%
engineering121970
 
11.0%
technology28080
 
2.5%
control26581
 
2.4%
systems26581
 
2.4%
automation26581
 
2.4%
sciences15715
 
1.4%
physics15400
 
1.4%
Other values (201)407409
36.6%

Most occurring characters

ValueCountFrequency (%)
e1146181
 
11.8%
n874397
 
9.0%
824930
 
8.5%
i768671
 
7.9%
c650290
 
6.7%
o618226
 
6.4%
t508312
 
5.2%
r492410
 
5.1%
g408748
 
4.2%
s356006
 
3.7%
Other values (39)3047998
31.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7515874
77.5%
Uppercase Letter962074
 
9.9%
Space Separator824930
 
8.5%
Other Punctuation381472
 
3.9%
Dash Punctuation11819
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
e1146181
15.3%
n874397
11.6%
i768671
10.2%
c650290
8.7%
o618226
8.2%
t508312
 
6.8%
r492410
 
6.6%
g408748
 
5.4%
s356006
 
4.7%
m334039
 
4.4%
Other values (13)1358594
18.1%
ValueCountFrequency (%)
S224139
23.3%
C171583
17.8%
E160617
16.7%
M70224
 
7.3%
T59958
 
6.2%
A45338
 
4.7%
P39954
 
4.2%
I39125
 
4.1%
R35080
 
3.6%
O30068
 
3.1%
Other values (11)85988
 
8.9%
ValueCountFrequency (%)
;239051
62.7%
&138703
36.4%
,3718
 
1.0%
ValueCountFrequency (%)
824930
100.0%
ValueCountFrequency (%)
-11819
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8477948
87.4%
Common1218221
 
12.6%

Most frequent character per script

ValueCountFrequency (%)
e1146181
13.5%
n874397
 
10.3%
i768671
 
9.1%
c650290
 
7.7%
o618226
 
7.3%
t508312
 
6.0%
r492410
 
5.8%
g408748
 
4.8%
s356006
 
4.2%
m334039
 
3.9%
Other values (34)2320668
27.4%
ValueCountFrequency (%)
824930
67.7%
;239051
 
19.6%
&138703
 
11.4%
-11819
 
1.0%
,3718
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII9696169
100.0%

Most frequent character per block

ValueCountFrequency (%)
e1146181
 
11.8%
n874397
 
9.0%
824930
 
8.5%
i768671
 
7.9%
c650290
 
6.7%
o618226
 
6.4%
t508312
 
5.2%
r492410
 
5.1%
g408748
 
4.2%
s356006
 
3.7%
Other values (39)3047998
31.4%

ArtsHumanities
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0005379135351
Minimum0
Maximum1
Zeros287305
Zeros (%)99.9%
Memory size2.2 MiB
2021-01-14T15:46:50.963766image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1
Range1
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.02062099833
Coefficient of variation (CV)38.33515424
Kurtosis1954.44341
Mean0.0005379135351
Median Absolute Deviation (MAD)0
Skewness42.92297734
Sum154.6738095
Variance0.0004252255723
MonotocityNot monotonic
2021-01-14T15:46:51.063585image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0287305
99.9%
196
 
< 0.1%
0.591
 
< 0.1%
0.333333333320
 
< 0.1%
0.2513
 
< 0.1%
0.27
 
< 0.1%
0.16666666676
 
< 0.1%
0.14285714296
 
< 0.1%
ValueCountFrequency (%)
0287305
99.9%
0.14285714296
 
< 0.1%
0.16666666676
 
< 0.1%
0.27
 
< 0.1%
0.2513
 
< 0.1%
ValueCountFrequency (%)
196
< 0.1%
0.591
< 0.1%
0.333333333320
 
< 0.1%
0.2513
 
< 0.1%
0.27
 
< 0.1%

LifeSciencesBiomedicine
Real number (ℝ≥0)

ZEROS

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1095826521
Minimum0
Maximum1
Zeros242657
Zeros (%)84.4%
Memory size2.2 MiB
2021-01-14T15:46:51.167287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2785685298
Coefficient of variation (CV)2.542086036
Kurtosis4.756714659
Mean0.1095826521
Median Absolute Deviation (MAD)0
Skewness2.479240321
Sum31509.83413
Variance0.07760042582
MonotocityNot monotonic
2021-01-14T15:46:51.489514image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
0242657
84.4%
120320
 
7.1%
0.512311
 
4.3%
0.33333333337348
 
2.6%
0.66666666672388
 
0.8%
0.251303
 
0.5%
0.6698
 
0.2%
0.75126
 
< 0.1%
0.2112
 
< 0.1%
0.166666666783
 
< 0.1%
Other values (5)198
 
0.1%
ValueCountFrequency (%)
0242657
84.4%
0.11111111112
 
< 0.1%
0.166666666783
 
< 0.1%
0.2112
 
< 0.1%
0.251303
 
0.5%
ValueCountFrequency (%)
120320
7.1%
0.833333333374
 
< 0.1%
0.823
 
< 0.1%
0.75126
 
< 0.1%
0.66666666672388
 
0.8%

PhysicalSciences
Real number (ℝ≥0)

ZEROS

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1482194167
Minimum0
Maximum1
Zeros223642
Zeros (%)77.8%
Memory size2.2 MiB
2021-01-14T15:46:51.595510image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.30871378
Coefficient of variation (CV)2.082816049
Kurtosis2.41000895
Mean0.1482194167
Median Absolute Deviation (MAD)0
Skewness1.956151124
Sum42619.60397
Variance0.09530419794
MonotocityNot monotonic
2021-01-14T15:46:51.703203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
0223642
77.8%
125316
 
8.8%
0.517618
 
6.1%
0.333333333310486
 
3.6%
0.66666666675237
 
1.8%
0.253237
 
1.1%
0.2867
 
0.3%
0.4792
 
0.3%
0.75178
 
0.1%
0.668
 
< 0.1%
Other values (6)103
 
< 0.1%
ValueCountFrequency (%)
0223642
77.8%
0.142857142936
 
< 0.1%
0.166666666736
 
< 0.1%
0.2867
 
0.3%
0.22222222222
 
< 0.1%
ValueCountFrequency (%)
125316
8.8%
0.827
 
< 0.1%
0.75178
 
0.1%
0.66666666675237
 
1.8%
0.668
 
< 0.1%

SocialSciences
Real number (ℝ≥0)

ZEROS

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.01971971495
Minimum0
Maximum1
Zeros279391
Zeros (%)97.2%
Memory size2.2 MiB
2021-01-14T15:46:51.806828image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1
Range1
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.1256870244
Coefficient of variation (CV)6.373673489
Kurtosis48.1070608
Mean0.01971971495
Median Absolute Deviation (MAD)0
Skewness6.88661225
Sum5670.285714
Variance0.01579722809
MonotocityNot monotonic
2021-01-14T15:46:51.913040image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0279391
97.2%
13751
 
1.3%
0.52263
 
0.8%
0.33333333331047
 
0.4%
0.25623
 
0.2%
0.6666666667335
 
0.1%
0.238
 
< 0.1%
0.7529
 
< 0.1%
0.628
 
< 0.1%
0.422
 
< 0.1%
Other values (4)17
 
< 0.1%
ValueCountFrequency (%)
0279391
97.2%
0.14285714291
 
< 0.1%
0.166666666710
 
< 0.1%
0.238
 
< 0.1%
0.25623
 
0.2%
ValueCountFrequency (%)
13751
1.3%
0.81
 
< 0.1%
0.7529
 
< 0.1%
0.6666666667335
 
0.1%
0.628
 
< 0.1%

Technology
Real number (ℝ≥0)

ZEROS

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7219403026
Minimum0
Maximum1
Zeros54360
Zeros (%)18.9%
Memory size2.2 MiB
2021-01-14T15:46:52.024354image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.5
median1
Q31
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.3965653732
Coefficient of variation (CV)0.5493049381
Kurtosis-0.7235442184
Mean0.7219403026
Median Absolute Deviation (MAD)0
Skewness-0.9796862641
Sum207589.6024
Variance0.1572640953
MonotocityNot monotonic
2021-01-14T15:46:52.141116image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
1179457
62.4%
054360
 
18.9%
0.527308
 
9.5%
0.666666666712278
 
4.3%
0.33333333337971
 
2.8%
0.753390
 
1.2%
0.6780
 
0.3%
0.25765
 
0.3%
0.2729
 
0.3%
0.8180
 
0.1%
Other values (8)326
 
0.1%
ValueCountFrequency (%)
054360
18.9%
0.14285714291
 
< 0.1%
0.166666666774
 
< 0.1%
0.2729
 
0.3%
0.25765
 
0.3%
ValueCountFrequency (%)
1179457
62.4%
0.857142857136
 
< 0.1%
0.833333333321
 
< 0.1%
0.8180
 
0.1%
0.753390
 
1.2%

ComputerScience
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
0
163918 
1
123626 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters287544
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1
ValueCountFrequency (%)
0163918
57.0%
1123626
43.0%
2021-01-14T15:46:52.379361image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-14T15:46:52.449114image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0163918
57.0%
1123626
43.0%

Most occurring characters

ValueCountFrequency (%)
0163918
57.0%
1123626
43.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number287544
100.0%

Most frequent character per category

ValueCountFrequency (%)
0163918
57.0%
1123626
43.0%

Most occurring scripts

ValueCountFrequency (%)
Common287544
100.0%

Most frequent character per script

ValueCountFrequency (%)
0163918
57.0%
1123626
43.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII287544
100.0%

Most frequent character per block

ValueCountFrequency (%)
0163918
57.0%
1123626
43.0%

Health
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
0
256531 
1
31013 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters287544
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0256531
89.2%
131013
 
10.8%
2021-01-14T15:46:52.626203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-14T15:46:52.696364image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0256531
89.2%
131013
 
10.8%

Most occurring characters

ValueCountFrequency (%)
0256531
89.2%
131013
 
10.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number287544
100.0%

Most frequent character per category

ValueCountFrequency (%)
0256531
89.2%
131013
 
10.8%

Most occurring scripts

ValueCountFrequency (%)
Common287544
100.0%

Most frequent character per script

ValueCountFrequency (%)
0256531
89.2%
131013
 
10.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII287544
100.0%

Most frequent character per block

ValueCountFrequency (%)
0256531
89.2%
131013
 
10.8%

NR
Real number (ℝ≥0)

Distinct402
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.23000306
Minimum1
Maximum1972
Zeros0
Zeros (%)0.0%
Memory size561.7 KiB
2021-01-14T15:46:52.788287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q114
median24
Q338
95-th percentile72
Maximum1972
Range1971
Interquartile range (IQR)24

Descriptive statistics

Standard deviation27.74537999
Coefficient of variation (CV)0.9178093676
Kurtosis234.6164263
Mean30.23000306
Median Absolute Deviation (MAD)11
Skewness7.514348762
Sum8692456
Variance769.8061108
MonotocityNot monotonic
2021-01-14T15:46:52.928130image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
128370
 
2.9%
158352
 
2.9%
108189
 
2.8%
167976
 
2.8%
117953
 
2.8%
147951
 
2.8%
137782
 
2.7%
187504
 
2.6%
177494
 
2.6%
207478
 
2.6%
Other values (392)208495
72.5%
ValueCountFrequency (%)
1251
 
0.1%
2663
 
0.2%
31496
 
0.5%
42643
0.9%
54155
1.4%
ValueCountFrequency (%)
19722
< 0.1%
10731
 
< 0.1%
8821
 
< 0.1%
8633
< 0.1%
7701
 
< 0.1%

TCperYear
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct3692
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.762959954
Minimum0
Maximum1587.8
Zeros80888
Zeros (%)28.1%
Memory size2.2 MiB
2021-01-14T15:46:53.074466image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.5
Q31.666666667
95-th percentile7
Maximum1587.8
Range1587.8
Interquartile range (IQR)1.666666667

Descriptive statistics

Standard deviation7.645493537
Coefficient of variation (CV)4.336736928
Kurtosis15083.66167
Mean1.762959954
Median Absolute Deviation (MAD)0.5
Skewness89.73195513
Sum506928.5571
Variance58.45357142
MonotocityNot monotonic
2021-01-14T15:46:53.215132image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
080888
28.1%
0.510892
 
3.8%
110531
 
3.7%
0.33333333337514
 
2.6%
0.255183
 
1.8%
24911
 
1.7%
0.66666666674726
 
1.6%
1.54027
 
1.4%
0.23768
 
1.3%
0.16666666672973
 
1.0%
Other values (3682)152131
52.9%
ValueCountFrequency (%)
080888
28.1%
0.033333333336
 
< 0.1%
0.0344827586227
 
< 0.1%
0.0357142857131
 
< 0.1%
0.0370370370450
 
< 0.1%
ValueCountFrequency (%)
1587.82
< 0.1%
932.61
< 0.1%
853.51
< 0.1%
7461
< 0.1%
558.61
< 0.1%

NumAuthors
Real number (ℝ≥0)

SKEWED

Distinct134
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.59242759
Minimum0
Maximum3049
Zeros1
Zeros (%)< 0.1%
Memory size561.7 KiB
2021-01-14T15:46:53.363395image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q34
95-th percentile7
Maximum3049
Range3049
Interquartile range (IQR)2

Descriptive statistics

Standard deviation124.5018182
Coefficient of variation (CV)11.75385124
Kurtosis467.5766407
Mean10.59242759
Median Absolute Deviation (MAD)1
Skewness21.17953762
Sum3045789
Variance15500.70275
MonotocityNot monotonic
2021-01-14T15:46:53.500284image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
376402
26.6%
266743
23.2%
456909
19.8%
532157
11.2%
118916
 
6.6%
616108
 
5.6%
77745
 
2.7%
84018
 
1.4%
92226
 
0.8%
101352
 
0.5%
Other values (124)4968
 
1.7%
ValueCountFrequency (%)
01
 
< 0.1%
118916
 
6.6%
266743
23.2%
376402
26.6%
456909
19.8%
ValueCountFrequency (%)
304939
< 0.1%
304139
< 0.1%
290841
< 0.1%
290042
< 0.1%
288841
< 0.1%

Organisation
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size281.0 KiB
Collaboration
146062 
Academia
130102 
Company
 
11380

Length

Max length13
Median length13
Mean length10.50024344
Min length7

Characters and Unicode

Total characters3019282
Distinct characters16
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCollaboration
2nd rowCollaboration
3rd rowCollaboration
4th rowCollaboration
5th rowAcademia
ValueCountFrequency (%)
Collaboration146062
50.8%
Academia130102
45.2%
Company11380
 
4.0%
2021-01-14T15:46:53.767203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-14T15:46:53.846621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
collaboration146062
50.8%
academia130102
45.2%
company11380
 
4.0%

Most occurring characters

ValueCountFrequency (%)
a563708
18.7%
o449566
14.9%
l292124
9.7%
i276164
9.1%
C157442
 
5.2%
n157442
 
5.2%
b146062
 
4.8%
r146062
 
4.8%
t146062
 
4.8%
m141482
 
4.7%
Other values (6)543168
18.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2731738
90.5%
Uppercase Letter287544
 
9.5%

Most frequent character per category

ValueCountFrequency (%)
a563708
20.6%
o449566
16.5%
l292124
10.7%
i276164
10.1%
n157442
 
5.8%
b146062
 
5.3%
r146062
 
5.3%
t146062
 
5.3%
m141482
 
5.2%
c130102
 
4.8%
Other values (4)282964
10.4%
ValueCountFrequency (%)
C157442
54.8%
A130102
45.2%

Most occurring scripts

ValueCountFrequency (%)
Latin3019282
100.0%

Most frequent character per script

ValueCountFrequency (%)
a563708
18.7%
o449566
14.9%
l292124
9.7%
i276164
9.1%
C157442
 
5.2%
n157442
 
5.2%
b146062
 
4.8%
r146062
 
4.8%
t146062
 
4.8%
m141482
 
4.7%
Other values (6)543168
18.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3019282
100.0%

Most frequent character per block

ValueCountFrequency (%)
a563708
18.7%
o449566
14.9%
l292124
9.7%
i276164
9.1%
C157442
 
5.2%
n157442
 
5.2%
b146062
 
4.8%
r146062
 
4.8%
t146062
 
4.8%
m141482
 
4.7%
Other values (6)543168
18.0%

Region
Categorical

Distinct9
Distinct (%)< 0.1%
Missing605
Missing (%)0.2%
Memory size281.3 KiB
NorthEast Asia
87043 
Western Europe
63918 
North America
47520 
Eastern Europe to Central Asia
21930 
MiddleEast and North Africa
19787 
Other values (4)
46741 

Length

Max length30
Median length14
Mean length17.00028926
Min length10

Characters and Unicode

Total characters4878046
Distinct characters27
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNorth America
2nd rowSouthEast Asia and Pacific
3rd rowWestern Europe
4th rowWestern Europe
5th rowNorth America
ValueCountFrequency (%)
NorthEast Asia87043
30.3%
Western Europe63918
22.2%
North America47520
16.5%
Eastern Europe to Central Asia21930
 
7.6%
MiddleEast and North Africa19787
 
6.9%
SouthEast Asia and Pacific18654
 
6.5%
South Asia15998
 
5.6%
Latin America and Caribbean10233
 
3.6%
Sub Saharan Africa1856
 
0.6%
(Missing)605
 
0.2%
2021-01-14T15:46:54.063087image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-14T15:46:54.161074image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
asia143625
19.4%
northeast87043
11.8%
europe85848
11.6%
north67307
9.1%
western63918
8.7%
america57753
7.8%
and48674
 
6.6%
to21930
 
3.0%
eastern21930
 
3.0%
central21930
 
3.0%
Other values (9)118914
16.1%

Most occurring characters

ValueCountFrequency (%)
a495960
 
10.2%
t454427
 
9.3%
451933
 
9.3%
r439461
 
9.0%
s354957
 
7.3%
e345317
 
7.1%
i300582
 
6.2%
o296780
 
6.1%
E233262
 
4.8%
A223021
 
4.6%
Other values (17)1282346
26.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3632361
74.5%
Uppercase Letter793752
 
16.3%
Space Separator451933
 
9.3%

Most frequent character per category

ValueCountFrequency (%)
a495960
13.7%
t454427
12.5%
r439461
12.1%
s354957
9.8%
e345317
9.5%
i300582
8.3%
o296780
8.2%
h190858
 
5.3%
n178774
 
4.9%
u122356
 
3.4%
Other values (7)452889
12.5%
ValueCountFrequency (%)
E233262
29.4%
A223021
28.1%
N154350
19.4%
W63918
 
8.1%
S38364
 
4.8%
C32163
 
4.1%
M19787
 
2.5%
P18654
 
2.4%
L10233
 
1.3%
ValueCountFrequency (%)
451933
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4426113
90.7%
Common451933
 
9.3%

Most frequent character per script

ValueCountFrequency (%)
a495960
11.2%
t454427
10.3%
r439461
9.9%
s354957
 
8.0%
e345317
 
7.8%
i300582
 
6.8%
o296780
 
6.7%
E233262
 
5.3%
A223021
 
5.0%
h190858
 
4.3%
Other values (16)1091488
24.7%
ValueCountFrequency (%)
451933
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4878046
100.0%

Most frequent character per block

ValueCountFrequency (%)
a495960
 
10.2%
t454427
 
9.3%
451933
 
9.3%
r439461
 
9.0%
s354957
 
7.3%
e345317
 
7.1%
i300582
 
6.2%
o296780
 
6.1%
E233262
 
4.8%
A223021
 
4.6%
Other values (17)1282346
26.3%

Country
Categorical

HIGH CARDINALITY

Distinct177
Distinct (%)0.1%
Missing2
Missing (%)< 0.1%
Memory size568.1 KiB
China
59292 
USA
39055 
India
 
14317
United Kingdom
 
13769
Japan
 
11205
Other values (172)
149904 

Length

Max length38
Median length5
Mean length7.178370464
Min length3

Characters and Unicode

Total characters2064083
Distinct characters54
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)< 0.1%

Sample

1st rowUSA
2nd rowAustralia
3rd rowBelgium
4th rowFrance
5th rowUSA
ValueCountFrequency (%)
China59292
20.6%
USA39055
 
13.6%
India14317
 
5.0%
United Kingdom13769
 
4.8%
Japan11205
 
3.9%
Iran, Islamic Republic of10465
 
3.6%
Canada8615
 
3.0%
Taiwan8488
 
3.0%
Germany8407
 
2.9%
Italy8340
 
2.9%
Other values (167)105589
36.7%
2021-01-14T15:46:54.491011image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
china59292
16.6%
usa39055
 
10.9%
republic19978
 
5.6%
of18353
 
5.1%
united14348
 
4.0%
india14317
 
4.0%
kingdom13769
 
3.9%
japan11205
 
3.1%
iran10465
 
2.9%
islamic10465
 
2.9%
Other values (194)145467
40.8%

Most occurring characters

ValueCountFrequency (%)
a289540
 
14.0%
i202814
 
9.8%
n197895
 
9.6%
e107316
 
5.2%
r81916
 
4.0%
l78047
 
3.8%
C71924
 
3.5%
d70136
 
3.4%
69172
 
3.4%
h67799
 
3.3%
Other values (44)827524
40.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1560299
75.6%
Uppercase Letter416232
 
20.2%
Space Separator69172
 
3.4%
Other Punctuation18379
 
0.9%
Dash Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a289540
18.6%
i202814
13.0%
n197895
12.7%
e107316
 
6.9%
r81916
 
5.3%
l78047
 
5.0%
d70136
 
4.5%
h67799
 
4.3%
o64066
 
4.1%
u47597
 
3.1%
Other values (16)353173
22.6%
ValueCountFrequency (%)
C71924
17.3%
S59595
14.3%
U54144
13.0%
A51769
12.4%
I46956
11.3%
R24330
 
5.8%
K22217
 
5.3%
T17348
 
4.2%
J11638
 
2.8%
G11506
 
2.8%
Other values (14)44805
10.8%
ValueCountFrequency (%)
,18353
99.9%
'26
 
0.1%
ValueCountFrequency (%)
69172
100.0%
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1976531
95.8%
Common87552
 
4.2%

Most frequent character per script

ValueCountFrequency (%)
a289540
 
14.6%
i202814
 
10.3%
n197895
 
10.0%
e107316
 
5.4%
r81916
 
4.1%
l78047
 
3.9%
C71924
 
3.6%
d70136
 
3.5%
h67799
 
3.4%
o64066
 
3.2%
Other values (40)745078
37.7%
ValueCountFrequency (%)
69172
79.0%
,18353
 
21.0%
'26
 
< 0.1%
-1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2064083
100.0%

Most frequent character per block

ValueCountFrequency (%)
a289540
 
14.0%
i202814
 
9.8%
n197895
 
9.6%
e107316
 
5.2%
r81916
 
4.0%
l78047
 
3.8%
C71924
 
3.5%
d70136
 
3.4%
69172
 
3.4%
h67799
 
3.3%
Other values (44)827524
40.1%

CountryCode
Categorical

HIGH CARDINALITY

Distinct177
Distinct (%)0.1%
Missing2
Missing (%)< 0.1%
Memory size568.1 KiB
CHN
59292 
USA
39055 
IND
 
14317
GBR
 
13769
JPN
 
11205
Other values (172)
149904 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters862626
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)< 0.1%

Sample

1st rowUSA
2nd rowAUS
3rd rowBEL
4th rowFRA
5th rowUSA
ValueCountFrequency (%)
CHN59292
20.6%
USA39055
 
13.6%
IND14317
 
5.0%
GBR13769
 
4.8%
JPN11205
 
3.9%
IRN10465
 
3.6%
CAN8615
 
3.0%
TWN8488
 
3.0%
DEU8407
 
2.9%
ITA8340
 
2.9%
Other values (167)105589
36.7%
2021-01-14T15:46:54.778853image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
chn59292
20.6%
usa39055
 
13.6%
ind14317
 
5.0%
gbr13769
 
4.8%
jpn11205
 
3.9%
irn10465
 
3.6%
can8615
 
3.0%
twn8488
 
3.0%
deu8407
 
2.9%
ita8340
 
2.9%
Other values (167)105589
36.7%

Most occurring characters

ValueCountFrequency (%)
N123923
14.4%
A84438
9.8%
C76767
 
8.9%
S70958
 
8.2%
U70889
 
8.2%
R67838
 
7.9%
H65595
 
7.6%
I38163
 
4.4%
P30612
 
3.5%
T30041
 
3.5%
Other values (16)203402
23.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter862626
100.0%

Most frequent character per category

ValueCountFrequency (%)
N123923
14.4%
A84438
9.8%
C76767
 
8.9%
S70958
 
8.2%
U70889
 
8.2%
R67838
 
7.9%
H65595
 
7.6%
I38163
 
4.4%
P30612
 
3.5%
T30041
 
3.5%
Other values (16)203402
23.6%

Most occurring scripts

ValueCountFrequency (%)
Latin862626
100.0%

Most frequent character per script

ValueCountFrequency (%)
N123923
14.4%
A84438
9.8%
C76767
 
8.9%
S70958
 
8.2%
U70889
 
8.2%
R67838
 
7.9%
H65595
 
7.6%
I38163
 
4.4%
P30612
 
3.5%
T30041
 
3.5%
Other values (16)203402
23.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII862626
100.0%

Most frequent character per block

ValueCountFrequency (%)
N123923
14.4%
A84438
9.8%
C76767
 
8.9%
S70958
 
8.2%
U70889
 
8.2%
R67838
 
7.9%
H65595
 
7.6%
I38163
 
4.4%
P30612
 
3.5%
T30041
 
3.5%
Other values (16)203402
23.6%

Interactions

2021-01-14T15:46:36.253803image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:36.435815image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:36.595568image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:36.754988image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:36.913377image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:37.071433image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:37.236389image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:37.403551image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:37.561697image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:37.719517image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:37.875295image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:38.040319image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:38.197960image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:38.355709image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:38.520489image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:38.684330image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:38.847084image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:39.006689image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:39.164066image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:39.321565image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:39.479488image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:39.636109image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:39.800388image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:39.968795image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:40.131758image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:40.287072image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:40.442942image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:40.599469image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:40.757702image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:40.913856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:41.078428image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:41.243401image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:41.407482image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:41.563555image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:41.720074image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:42.075613image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:42.231636image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:42.387649image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:42.556226image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:42.719146image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:42.877028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:43.040503image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:43.195956image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:43.352817image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:43.507946image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:43.682520image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:43.849993image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:44.018709image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:44.179951image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:44.353424image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:44.521779image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:44.692269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:44.866319image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:45.035141image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:45.201369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:45.374815image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:45.545028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:45.712488image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:45.880253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:46.053115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:46.220154image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:46.388315image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:46.557719image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:46.732567image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:46.901946image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:47.061288image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:47.219907image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:47.381407image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:47.542866image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:47.705273image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:47.865263image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-14T15:46:48.037329image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-01-14T15:46:54.883957image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-01-14T15:46:55.075684image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-01-14T15:46:55.266976image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-01-14T15:46:55.463324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-01-14T15:46:55.642085image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-01-14T15:46:48.429346image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-01-14T15:46:49.062380image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-01-14T15:46:49.861488image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-01-14T15:46:50.061951image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

PYSCArtsHumanitiesLifeSciencesBiomedicinePhysicalSciencesSocialSciencesTechnologyComputerScienceHealthNRTCperYearNumAuthorsOrganisationRegionCountryCountryCode
01998Computer Science; Engineering0.00.00.00.01.010130.5454553CollaborationNorth AmericaUSAUSA
11998Computer Science; Engineering0.00.00.00.01.010240.3636362CollaborationSouthEast Asia and PacificAustraliaAUS
21998Computer Science; Engineering0.00.00.00.01.010133.4545454CollaborationWestern EuropeBelgiumBEL
31998Computer Science; Engineering0.00.00.00.01.010133.4545454CollaborationWestern EuropeFranceFRA
41998Computer Science; Engineering0.00.00.00.01.010120.1363642AcademiaNorth AmericaUSAUSA
51998Computer Science; Engineering0.00.00.00.01.010191.7727272CollaborationSouth AsiaIndiaIND
61998Computer Science; Engineering0.00.00.00.01.010341.1818182AcademiaNorth AmericaUSAUSA
71998Computer Science; Engineering0.00.00.00.01.010230.1818183AcademiaWestern EuropeUnited KingdomGBR
81998Computer Science; Engineering0.00.00.00.01.010143.1363643CollaborationWestern EuropeGermanyDEU
91998Computer Science; Engineering0.00.00.00.01.010170.4090913AcademiaNorth AmericaUSAUSA

Last rows

PYSCArtsHumanitiesLifeSciencesBiomedicinePhysicalSciencesSocialSciencesTechnologyComputerScienceHealthNRTCperYearNumAuthorsOrganisationRegionCountryCountryCode
2875342017General & Internal Medicine0.01.00.00.00.001210.6666672AcademiaWestern EuropeUnited KingdomGBR
2875352018Business & Economics; Biomedical Social Sciences0.00.00.01.00.00030.0000001AcademiaNorthEast AsiaChinaCHN
2875362016Cultural Studies0.00.00.01.00.000320.0000001AcademiaSouthEast Asia and PacificAustraliaAUS
2875372018History1.00.00.00.00.000660.0000001AcademiaWestern EuropeNetherlandsNLD
2875382015Business & Economics0.00.00.01.00.000530.2000001CollaborationWestern EuropeUnited KingdomGBR
2875392015Business & Economics0.00.00.01.00.000330.2000002CollaborationNorth AmericaCanadaCAN
2875402009Business & Economics0.00.00.01.00.000600.0000002AcademiaNorth AmericaUSAUSA
2875412001Literature1.00.00.00.00.000222.3684212AcademiaNorth AmericaUSAUSA
2875422018Literature1.00.00.00.00.000340.0000001AcademiaWestern EuropeSwitzerlandCHE
2875432017Literature1.00.00.00.00.000450.0000001AcademiaWestern EuropeUnited KingdomGBR